PaRSEC now allows DSLs to free the gpu task#307
PaRSEC now allows DSLs to free the gpu task#307devreal wants to merge 5 commits intoTESSEorg:masterfrom
Conversation
|
This doesn't work as thought. PaRSEC releases the task containing the gpu_task structure before the gpu_task is released so we end up with fields overwritten prematurely. |
|
I take back what I said earlier. The error was somewhere else and not in this PR. Ready for review. |
| set(TTG_TRACKED_CATCH2_VERSION 3.5.0) | ||
| set(TTG_TRACKED_MADNESS_TAG 93a9a5cec2a8fa87fba3afe8056607e6062a9058) | ||
| set(TTG_TRACKED_PARSEC_TAG 58f8f3089ecad2e8ee50e80a9586e05ce8873b1c) | ||
| set(TTG_TRACKED_PARSEC_TAG a9ab33d8287578c68c0349662352f280bc83e2c0) |
There was a problem hiding this comment.
Too many things missing that we need in PaRSEC so that would last for 1 PR:
- Use red-black-tree in zone_malloc ICLDisco/parsec#710
- Use 64bit integer when computing the ordered list pivot ICLDisco/parsec#706
- Provide mechanism to discard data ICLDisco/parsec#695
- Offload device task release to worker threads ICLDisco/parsec#687 (or related)
- Make GPU manager skip records when nothing scheduled on input stream ICLDisco/parsec#681
- Topic/cuda aware communications ICLDisco/parsec#671
Maybe 4.1 will work for us.
ttg/ttg/parsec/ttg.h
Outdated
| tc.out[i] = gpu_task->flow[i]; | ||
| /* set up the device task */ | ||
| parsec_gpu_task_t *gpu_task = task->dev_ptr->gpu_task; | ||
| /* TODO: needed? */ |
There was a problem hiding this comment.
You should not need this, you construct the list_item and then set the rest of the gpu_task fields to default values.
ttg/ttg/parsec/ttg.h
Outdated
| parsec_task_class_t& tc = task->dev_ptr->task_class; | ||
|
|
||
| // input flows are set up during register_device_memory as part of the first invocation above | ||
| for (int i = 0; i < MAX_PARAM_COUNT; ++i) { |
There was a problem hiding this comment.
Why is the upper bound here always MAX_PARAM_COUNT ?
There was a problem hiding this comment.
Because we don't know how many device inputs the application will give us. We could put a stop there but the impact will be marginal.
We can allocate the GPU task inside the task structure and avoid an extra allocation. Signed-off-by: Joseph Schuchart <joseph.schuchart@stonybrook.edu>
f6c8441 to
2c1323a
Compare
With ICLDisco/parsec#694 PaRSEC will support dyanmically allocated flows based on the application-managed gpu task structure. This allows us to ditch the extra task class structure and lets us cut down loops over MAX_PARAM_COUNT. Signed-off-by: Joseph Schuchart <joseph.schuchart@stonybrook.edu>
…rsec-gpu-task-free
|
This PR now depends on ICLDisco/parsec#694 going into PaRSEC and needs some more testing |
Signed-off-by: Joseph Schuchart <joseph.schuchart@stonybrook.edu>
3c26ab5 to
a55f24a
Compare
a2878ce to
a49eb91
Compare
We can allocate the GPU task inside the task structure and avoid an extra allocation.